AITopics | interface element

Collaborating Authors

interface element

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

InSight-R: A Framework for Risk-informed Human Failure Event Identification and Interface-Induced Risk Assessment Driven by AutoGraph

Xiao, Xingyu, Tong, Jiejuan, Chen, Peng, Sun, Jun, Sui, Zhe, Liang, Jingang, Zhao, Hongru, Zhao, Jun, Wang, Haitao

arXiv.org Artificial IntelligenceJul-2-2025

Human reliability remains a critical concern in safety-critical domains such as nuclear power, where operational failures are often linked to human error. While conventional human reliability analysis (HRA) methods have been widely adopted, they rely heavily on expert judgment for identifying human failure events (HFEs) and assigning performance influencing factors (PIFs). This reliance introduces challenges related to reproducibility, subjectivity, and limited integration of interface-level data. In particular, current approaches lack the capacity to rigorously assess how human-machine interface design contributes to operator performance variability and error susceptibility. To address these limitations, this study proposes a framework for risk-informed human failure event identification and interface-induced risk assessment driven by AutoGraph (InSight-R). By linking empirical behavioral data to the interface-embedded knowledge graph (IE-KG) constructed by the automated graph-based execution framework (Auto-Graph), the InSight-R framework enables automated HFE identification based on both error-prone and time-deviated operational paths. Furthermore, we discuss the relationship between designer-user conflicts and human error. This framework offers actionable insights for interface design optimization and contributes to the advancement of mechanism-driven HRA methodologies. Keywords: Knowledge-Graph-Driven, Automated, Interface-Induced Risk, Human Error Identification 1 Introduction Human error remains a leading contributor to failures in complex socio-technical systems such as nuclear power plants, aviation, and healthcare, where safety-critical operations depend on accurate and timely human decisions [1, 2]. Human reliability analysis (HRA) methods have been widely used to model operator behavior and assess the likelihood of human failure events (HFEs) [3]. However, prevailing HRA approaches are often constrained by their reliance on expert judgment, particularly in the identification of HFEs and the assignment of performance influencing factors (PIFs) [3, 4]. In traditional HRA frameworks such as the integrated human event analysis system for event and condition assessment (IDHEAS-ECA), HFEs are primarily determined through expert elicitation, a process that, while practical, suffers from limited reproducibility, insufficient transparency, and weak theoretical grounding [5].

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2507.00066

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > District of Columbia > Washington (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Energy > Power Industry > Utilities > Nuclear (0.90)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Understanding GUI Agent Localization Biases through Logit Sharpness

Tao, Xingjian, Wang, Yiwei, Cai, Yujun, Yang, Zhicheng, Tang, Jing

arXiv.org Artificial IntelligenceJun-19-2025

Multimodal large language models (MLLMs) have enabled GUI agents to interact with operating systems by grounding language into spatial actions. Despite their promising performance, these models frequently exhibit hallucinations-systematic localization errors that compromise reliability. We propose a fine-grained evaluation framework that categorizes model predictions into four distinct types, revealing nuanced failure modes beyond traditional accuracy metrics. To better quantify model uncertainty, we introduce the Peak Sharpness Score (PSS), a metric that evaluates the alignment between semantic continuity and logits distribution in coordinate prediction. Building on this insight, we further propose Context-Aware Cropping, a training-free technique that improves model performance by adaptively refining input context. Extensive experiments demonstrate that our framework and methods provide actionable insights and enhance the interpretability and robustness of GUI agent behavior.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2506.15425

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Fatigue-Aware Adaptive Interfaces for Wearable Devices Using Deep Learning

Wang, Yikan

arXiv.org Artificial IntelligenceJun-17-2025

Wearable devices, such as smartwatches and head-mounted displays, are increasingly used for prolonged tasks like remote learning and work, but sustained interaction often leads to user fatigue, reducing efficiency and engagement. This study proposes a fatigue-aware adaptive interface system for wearable devices that leverages deep learning to analyze physiological data (e.g., heart rate, eye movement) and dynamically adjust interface elements to mitigate cognitive load. The system employs multimodal learning to process physiological and contextual inputs and reinforcement learning to optimize interface features like text size, notification frequency, and visual contrast. Experimental results show a 18% reduction in cognitive load and a 22% improvement in user satisfaction compared to static interfaces, particularly for users engaged in prolonged tasks. This approach enhances accessibility and usability in wearable computing environments.

artificial intelligence, human computer interaction, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2506.13203

Country:

North America > United States (0.15)
Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Information Technology > Hardware (0.86)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Language Model-Brained GUI Agents: A Survey

Zhang, Chaoyun, He, Shilin, Qian, Jiaxu, Li, Bowen, Li, Liqun, Qin, Si, Kang, Yu, Ma, Minghua, Liu, Guyue, Lin, Qingwei, Rajmohan, Saravan, Zhang, Dongmei, Zhang, Qi

arXiv.org Artificial IntelligenceDec-30-2024

GUIs have long been central to human-computer interaction, providing an intuitive and visually-driven way to access and interact with digital systems. The advent of LLMs, particularly multimodal models, has ushered in a new era of GUI automation. They have demonstrated exceptional capabilities in natural language understanding, code generation, and visual processing. This has paved the way for a new generation of LLM-brained GUI agents capable of interpreting complex GUI elements and autonomously executing actions based on natural language instructions. These agents represent a paradigm shift, enabling users to perform intricate, multi-step tasks through simple conversational commands. Their applications span across web navigation, mobile app interactions, and desktop automation, offering a transformative user experience that revolutionizes how individuals interact with software. This emerging field is rapidly advancing, with significant progress in both research and industry. To provide a structured understanding of this trend, this paper presents a comprehensive survey of LLM-brained GUI agents, exploring their historical evolution, core components, and advanced techniques. We address research questions such as existing GUI agent frameworks, the collection and utilization of data for training specialized GUI agents, the development of large action models tailored for GUI tasks, and the evaluation metrics and benchmarks necessary to assess their effectiveness. Additionally, we examine emerging applications powered by these agents. Through a detailed analysis, this survey identifies key research gaps and outlines a roadmap for future advancements in the field. By consolidating foundational knowledge and state-of-the-art developments, this work aims to guide both researchers and practitioners in overcoming challenges and unlocking the full potential of LLM-brained GUI agents.

graphical user interface, human-computer interaction, language model-brained gui agent, (16 more...)

arXiv.org Artificial Intelligence

2411.18279

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
(8 more...)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry:

Information Technology > Software (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(6 more...)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

Sharingan: Extract User Action Sequence from Desktop Recordings

Chen, Yanting, Ren, Yi, Qin, Xiaoting, Zhang, Jue, Yuan, Kehong, Han, Lu, Lin, Qingwei, Zhang, Dongmei, Rajmohan, Saravan, Zhang, Qi

arXiv.org Artificial IntelligenceNov-13-2024

Video recordings of user activities, particularly desktop recordings, offer a rich source of data for understanding user behaviors and automating processes. However, despite advancements in Vision-Language Models (VLMs) and their increasing use in video analysis, extracting user actions from desktop recordings remains an underexplored area. This paper addresses this gap by proposing two novel VLM-based methods for user action extraction: the Direct Frame-Based Approach (DF), which inputs sampled frames directly into VLMs, and the Differential Frame-Based Approach (DiffF), which incorporates explicit frame differences detected via computer vision techniques. We evaluate these methods using a basic self-curated dataset and an advanced benchmark adapted from prior work. Our results show that the DF approach achieves an accuracy of 70% to 80% in identifying user actions, with the extracted action sequences being re-playable though Robotic Process Automation. We find that while VLMs show potential, incorporating explicit UI changes can degrade performance, making the DF approach more reliable. This work represents the first application of VLMs for extracting user action sequences from desktop recordings, contributing new methods, benchmarks, and insights for future research.

opération, scroll, sequence, (17 more...)

arXiv.org Artificial Intelligence

2411.08768

Country:

Europe > Spain > Aragón (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre:

Workflow (1.00)
Research Report > New Finding (0.68)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Mind-proofing Your Phone: Navigating the Digital Minefield with GreaseTerminator

Datta, Siddhartha, Kollnig, Konrad, Shadbolt, Nigel

arXiv.org Artificial IntelligenceFeb-1-2022

Digital harms are widespread in the mobile ecosystem. As these devices gain ever more prominence in our daily lives, so too increases the potential for malicious attacks against individuals. The last line of defense against a range of digital harms - including digital distraction, political polarisation through hate speech, and children being exposed to damaging material - is the user interface. This work introduces GreaseTerminator to enable researchers to develop, deploy, and test interventions against these harms with end-users. We demonstrate the ease of intervention development and deployment, as well as the broad range of harms potentially covered with GreaseTerminator in five in-depth case studies.

artificial intelligence, intervention, machine learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3490099.3511152

2112.10699

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
Europe > Finland > Uusimaa > Helsinki (0.05)
North America > United States > New York > New York County > New York City (0.05)
(20 more...)

Genre: Research Report (0.50)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (0.94)

Technology:

Information Technology > Software (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
(5 more...)

Add feedback

Decomposed Inductive Procedure Learning

Weitekamp, Daniel, MacLellan, Christopher, Harpstead, Erik, Koedinger, Kenneth

arXiv.org Artificial IntelligenceOct-25-2021

Recent advances in machine learning have made it possible to train artificially intelligent agents that perform with super-human accuracy on a great diversity of complex tasks. However, the process of training these capabilities often necessitates millions of annotated examples -- far more than humans typically need in order to achieve a passing level of mastery on similar tasks. Thus, while contemporary methods in machine learning can produce agents that exhibit super-human performance, their rate of learning per opportunity in many domains is decidedly lower than human-learning. In this work we formalize a theory of Decomposed Inductive Procedure Learning (DIPL) that outlines how different forms of inductive symbolic learning can be used in combination to build agents that learn educationally relevant tasks such as mathematical, and scientific procedures, at a rate similar to human learners. We motivate the construction of this theory along Marr's concepts of the computational, algorithmic, and implementation levels of cognitive modeling, and outline at the computational-level six learning capacities that must be achieved to accurately model human learning. We demonstrate that agents built along the DIPL theory are amenable to satisfying these capacities, and demonstrate, both empirically and theoretically, that DIPL enables the creation of agents that exhibit human-like learning performance.

agent, mechanism, simulated learner, (14 more...)

arXiv.org Artificial Intelligence

2110.13233

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre:

Instructional Material (0.67)
Research Report (0.63)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Government > Regional Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
(4 more...)

Add feedback

Grounding Natural Language Instructions: Can Large Language Models Capture Spatial Information?

Rozanova, Julia, Ferreira, Deborah, Dubba, Krishna, Cheng, Weiwei, Zhang, Dell, Freitas, Andre

arXiv.org Artificial IntelligenceSep-17-2021

Models designed for intelligent process automation are required to be capable of grounding user interface elements. This task of interface element grounding is centred on linking instructions in natural language to their target referents. Even though BERT and similar pre-trained language models have excelled in several NLP tasks, their use has not been widely explored for the UI grounding domain. This work concentrates on testing and probing the grounding abilities of three different transformer-based models: BERT, RoBERTa and LayoutLM. Our primary focus is on these models' spatial reasoning skills, given their importance in this domain. We observe that LayoutLM has a promising advantage for applications in this domain, even though it was created for a different original purpose (representing scanned documents): the learned spatial features appear to be transferable to the UI grounding setting, especially as they demonstrate the ability to discriminate between target directions in natural language instructions.

dataset, interface element, reasoning, (14 more...)

arXiv.org Artificial Intelligence

2109.08634

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Neural Networks Art: Solving Problems with Multiple Solutions and New Teaching Algorithm

#artificialintelligenceJul-22-2020, 08:10:38 GMT

The human brain processes information flows continuously from the external environment. However, it can modify and update the stored images, and create new, without destroying what previously memorized. Thus it differs significantly from the majority of neural networks as neural networks (NN), trained by back propagation, genetic algorithms, in bidirectional associative memory, Hopfield networks, etc. very often a new way of learning, situation or association significantly distorts or even destroys the fruits of prior learning, requiring a change in a significant part of weights of connections or complete ret raining of the network [1-4]. Impossibility of using the specified NN solve the problem of stability-plasticity, that is a problem of perception and memorization of new information without loss or distortion of existing, was one of the main reasons for the development of fundamentally new configurations of neural networks. Examples of such networks are neural networks, derived from the adaptive resonance theory (ART), developed by Carpenter and Grossberg [5, 6].

artificial intelligence, machine learning, neuron, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Scout: Rapid Exploration of Interface Layout Alternatives through High-Level Design Constraints

Swearngin, Amanda, Wang, Chenglong, Oleson, Alannah, Fogarty, James, Ko, Amy J.

arXiv.org Artificial IntelligenceJan-15-2020

Although exploring alternatives is fundamental to creating better interface designs, current processes for creating alternatives are generally manual, limiting the alternatives a designer can explore. We present Scout, a system that helps designers rapidly explore alternatives through mixed-initiative interaction with high-level constraints and design feedback. Prior constraint-based layout systems use low-level spatial constraints and generally produce a single design. Tosupport designer exploration of alternatives, Scout introduces high-level constraints based on design concepts (e.g.,~semantic structure, emphasis, order) and formalizes them into low-level spatial constraints that a solver uses to generate potential layouts. In an evaluation with 18 interface designers, we found that Scout: (1) helps designers create more spatially diverse layouts with similar quality to those created with a baseline tool and (2) can help designers avoid a linear design process and quickly ideate layouts they do not believe they would have thought of on their own.

designer, layout, scout, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3313831.3376593

2001.05424

Country:

North America > United States > Washington > King County > Seattle (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)

Add feedback